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ABSTRACT 

The Set Covering Problem (SCP) and the Set Partitioning Problem (SPP) 
represent an important class of all-binary (0-1) Integer Linear Programs 
(ILP). A review of the literature reveals extensive application of the 
SPP/SCP model to a wide set of practical problems. The basic model is 
explained, and then many of the actual applications of this powerful 
model discovered in the literature review are discussed. The problems 
derived from these applications are difficult to solve with any method, 
and are particularly difficult to solve with optimal or exact algorithms. 
Various solution techniques are investigated within the framework of the 
classical simplex method with branch and bound enumeration. Several 
reformulations of the SPP/SCP as Integer Generalized Networks are examined. 
Extensive computational results are reported for several "real world" 
large-scale problems, and a convenient, compact format for data input 
is proposed as a standard for this problem class. 
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I. INTRODUCTION 



The Set Covering Problem (SCP) and the Set Partitioning Problem (SPP) 
represent an important class of all-binary (0-1) Integer Linear Programs 
(HP's). These problems have binary variables, binary constraint 
coefficients and unit or integer resources. 

The basic SPP/SCP model has been known for over 25 years. It is 
enticing in formulation and deceptively simple. A review of the open 
literature reveals extensive application of the SPP/SCP model to a wide 
range of problems, including airline crew scheduling, vehicle routing, 
and facilities location. Even though the model has been intensively 
studied for both its intriguing binary structure and its potential for 
practical application, exact solution technologies for large-scale 
problems were not evident until the work of such researchers as Marsten 
[Ref. 1] began to appear in the early 1970' s. Other early contributors 
are listed by Christofides [Ref. 2]. 

After first defining the basic model and discussing many of its 
applications, several reformulations of the SPP/SCP will be examined. 

Glover and Mulvey [Ref. 3] have presented two reformulations of the 
binary HP as an Integer Generalized Network (GNIP). There is very 
little computational evidence in the literature concerning these reformu- 
lations; therefore, the computational behavior of this approach will be 
tested and results reported for several problems. Another reformulation 
of the SSP/SCP as an Integer Generalized Processing Network (a network 
with special side constraints) will be examined and its potential evaluated. 
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As indicated by the many reformulations, manipulative techniques, and 
heuristic methods appearing in the literature, these problems are diffi- 
cult to solve reliably with any method, and are particularly difficult to 
solve with optimal or exact algorithms. Various solution techniques 
based on the classical simplex method with branch and bound enumeration 
are investigated in this study. Some of the techniques examined are 
basis factorization, elastic programming, enumeration schemes, network and 
linear programming relaxation, logical reduction, and heuristic methods 
for obtaining starting solutions. Extensive computational results are 
reported for several “real world" large-scale problems, and a convenient, 
compact format for data input is proposed as a standard for this problem 
class. 
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II. THE BASIC MOOEL AND MODEL GENERATION 



A. THE BASIC MODEL 

The SCP formulated as an ILP is of the form: 



n 

(1) MIN L C-X, 

j J=1 J J 



n 



(2) s.t. 


£ a ijXj 

j=l J J 


>_ 


b-j i = 1, 


. . . , m 


(3) 


X. 

J 


G 


10,11 j = 1, 


. . . , n 


(4) 


c. 

J 


>_ 


0 


J = 1, 


. . . , n 


(5) 


b. 

l 


>_ 


0 


and integer i 


= 1 , . . . , m 










1 if column j 


covers row i , 


(6) 


a . . 


= 


« 








' J 






0 otherwise. 








A minimal cost set of columns must be selected from X. such that 

J 

the magnitudes of the right-hand sides (RHS), b., are "covered 11 or 
satisfied. If (2) and (5) are replaced by 
n 

(7) £ a i -X • = 1, i = l, m, 

j=l J J 

we have the SPP (sometimes referred to in the literature as the equality 
constrained SCP). This restriction of the SCP exhibits sufficient 
modelling and computational interest to be studied in its own right. For 
the SPP, the rows T i > represent a set which must be partitioned by a 
combination of mutually exclusive columns at minimum cost. 



12 



B. SIDE CONDITIONS 



Many practical applications of the SPP/SCP formulation add logical 
constraints to the basic model discussed above. For example, suppose 
there are p sets of columns S k , k * 1, p in the model and only one 

column from each of the sets S k is eligible to be included in the final 
solution. This restriction will produce constraints 

(8) E X. =1 for all k. 

OcS k J 

In another case, suppose that the solution must include exactly J columns. 
This results in the cardinality constraint 

(9) E X i = J 
j J 

being appended to the basic model. Introductory modelling texts such as 
Wagner [Ref. 4] and Gaver and Thompson [Ref. 5] discuss many such logical 
conditions formulated with binary variables. Any or all of these logical 
conditions can be included to extend the basic model for the purpose 
required. 

C. COLUMN GENERATION 

The art of formulating the practical SPP/SCP lies in the schemes used 
for column generation. It is possible, of course, to generate all Z m - 1 
columns capable of covering or partitioning the rows, but for any relatively 
large number of rows, the problem becomes intractable. This "all oossible 
combinations" formulation is known as the Complete SPP/SCP, and even 
though techniques are emerging for attacking such problems [Ref. 6], 
efforts must be made to keep the number of permissible columns within 
the capabilities of the optimizer being used. Editing reductions can be 
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realized by incorporating such conditions as managerial specifications of 
operating policy; dimensional restrictions on time, distance, and space; 
legal restrictions; labor union restrictions; cash flow restrictions; 
environmental restrictions; and as many other "real world" constraints as 
can be included in the column generation process. 

Incorporating such conditions into the column generator can handle 
most, if not all, side conditions and feasibility issues without including 
them as extensions of the basic SPP/SCP model. Some examples are described 
by Marsten and Muller [Ref. 7], Shanker, Turner, and Zoltners [Ref. 8], 
and Cullen, Jarvis, and Ratliff [Ref. 9]. 

D. THE OBJECTIVE FUNCTION 

The cost coefficients Cj for the basic model can be of two types: 
physical and ordinal. A physical cost is a coefficient in units of 
dollars, miles, time, etc., and represents the cost of covering certain 
rows with column j. The associated physical objective function expresses 
the cost of covering or partitioning the set represented by the rows. 

It is quite often the case, though, that the cost coefficients serve 
only as a means of distinguishing between alternate columns. In many 
political and social models, for example, a column will be assigned a 
subjective number depicting some measure of acceptability (or unacceot- 
ability) thus effecting an ordinal ranking structure in the objective 
function. The objective then becomes a matter of selecting those columns 
which minimize the ordinal ity. A much-used special case of the ordinal 
cost structure is the unit-cost objective function in which C- = 1 for 

J 

all j. The optimal solution for the unit-cost objective function is the 
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minimum number of 
without regard to 



columns capable of covering or partitioning the row set 
physical cost or ordinal ranking. 
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III. APPLICATIONS AND PROBLEM DESCRIPTIONS 



A. APPLICATIONS 

Set Covering and Set Partitioning Problems have been studied extensively 
because of their many practical applications. The surveys by Garfinkel 
and Nemhauser [Ref. 10] and Balas and Padberg [Ref. 11] list many useful 
applications which have appeared in the literature. Some of these are 
listed below, along with a few which have subsequently appeared. 





APPLICATION 


REFERENCE (*Unsighted) 


1 . 


Truck Deliveries 


[Ref. 12],* [Ref. 13],* [Ref. 14]*, 
[Ref. 15], [Ref. 16]*. 


2. 


Tanker Routing 


[Ref. 17]. 


3. 


Aircrew Schedul ing 


[Ref. 18],* [Ref. 19],* [Ref. 20],* 
[Ref. 21],* [Ref. 22]. [Ref. 23], 
[Ref. 7]. 


4. 


Facilities Location 


[Ref. 24], TRef. 25],* [Ref. 26], 
[Ref. 27], [Ref. 28]. 


5. 


Air Fleet Scheduling 


[Ref. 29],* [Ref. 7]. 


6. 


List Selection 


[Ref. 30], 


7. 


Political Districting 


[Ref. 31],* [Ref. 32],* [Ref. 33]. 


3. 


Nuclear and Conventional 
Targeting 


(See Appendix A). 


9. 


Information Retrieval 


[Ref. 34],* [Ref. 35].* 


10. 


Symbolic Logic 


[Ref. 36].* 


11. 


Switching Theory 


TRef. 37], [Ref. 38],* [Ref. 39],* 
[Ref. 40].* 


12. 


Stock Cutting or Trimming 


[Ref. 41].* 


13. 


Line Bal ancing 


[Ref. 42].* 
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APPLICATION 



REFERENCE 



14. Capacity Balancing 

15. PERT-CPM 

16. Frequency Allocation 

17. Tracking Problems 

18. Vehicle Routing 

19. Sales Territory Design 

20. Coloring Problems 

21. Disconnecting Paths in a GraDh 

22. Cyclic Scheduling Problems 



[Ref. 43].* 

[Ref. 36].* 

[Ref. 44]. 

[Ref. 45]. 

[Ref. 9]. 

[Ref. 8]. 

[Ref. 46],* [Ref. 47], [Ref. 48].* 
[Ref. 49], [Ref. 50]. 

[Ref. 51], [Ref. 52]. 



3. THE TRUCK DELIVERY PROBLEM 

The first application we will examine is one that appears quite often 
in textbooks and is a simple illustration of the basic model. This 
problem will also provide an example which will be carried forward 
through discussion in later sections. 

Consider the problem of making deliveries to m locations by truck 
(rail, aircraft, ship, messenger, etc.). There are n feasible routes to 
choose from and a^,. = 1 if location i is on route j. A cost C • (say, 

J J 

time, dollars, miles) is assigned to route j. An optimal partition gives 
a minimal cost routing that makes each delivery exactly once. An optimal 
cover gives a minimal cost routing that makes sufficient deliveries to 
satisfy the demand at each location. The optimal solution to the unit- 
cost problem yields the minimum number of trucks necessary to make the 
required deliveries. 

Table 1 is the explicit tableau for an illustrative example of the 
SPP. The flight scheduler for a small West Coast air freight comDany has 
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TABLE 1. AIR FREIGHT EXAMPLE 





R1 


R2 




R3 


R4 1 


R5 


R6 


R7 


RHS 




Los Angeles 


1 

| 1 
1 


0 




0 


0 


0 


0 


0 


r 

i = i 

i 


1 

1 

I 


San Francisco 


1 

| 1 
1 


1 




1 


0 


0 


0 


0 


1 

i = i 

i 


! 

1 

I 


San Jose 


| 1 
1 


1 




1 


0 


0 


0 


0 


1 

i = i 

i 


1 

| 


Denver 


1 

1 o 
1 


0 




1 


1 


1 


0 


0 


1 

i = i 

i 


1 

| 


Portland 


1 

1 o 
1 


0 




0 


1 


1 


0 


0 


1 

1 = 1 
i 


1 

| 


Seattle 


1 o 
1 


0 




0 


1 


1 


1 


0 


1 = 1 
i 


1 

I 


San Diego 


1 o 
1 


0 




0 


0 


1 


1 


1 


1 

1 = 1 
1 


1 

1 

1 


Costs 


1 

1 o 
1 


0 




0 


0 


6 


7 


4 


T 

1 ob j . 
1 


1 

1 

1 


been assigned the task of 


delivering exactly one of 


seven 


identical 




packages to each of 


seven 


western 


cities by tomorrow morning. 


All of 


the 


del ivery points can 


be reached 


in 


the 


required 


time 


with 


the current 




schedule except San 


Diego 


. There 


are 


only three feasible 


ways 


to make 


the San Diego delivery: 


extend 


route 


5, extend route 6, 


or add a new 




route 7. The cost of the 


alternatives 


is calculated 


and 


appears in the 


tableau. By inspection. 


there 


are 


! only two feasible 


sol utions 


■: (1) 




Routes 1 and 5 at a 


cost 


of 6, 


and 


1 (2) 


Routes 


1, 4, 


and 


7 at 


a cost 


of 



4. The minimum cost solution, therefore, is to add flight 7 to the 
current schedule. The unit-cost solution or minimum partition is to use 
solution (1). 

Two large-scale, real-life problems of this type have been examined 
in this study: TRUCK and TANKER. TRUCK is a nationwide, intercity truck 

routing problem, a large SCP, and fits the basic description above. 



18 



TANKER is a worldwide oil tanker fleet scheduling problem which extends 
the basic SPP model to help choose between company-owned and charter 
tankers to meet refinery delivery requirements from available loading 
volumes and origins. Each cargo, company-owned ship, and potential 
charter vessel is represented by a row. Cargoes must be carried, and 
ships must either be used, or scheduled in demurrage. Each column 
represents a feasible route for a particular ship; during the planning 
horizon, it may carry zero, or more cargoes. The cost of each route may 
be calculated ordinally (based on fleet size) or economically (based on 
operating costs) . 

The problem dimensions for TRUCK and TANKER are listed in Table 2. 
NZEL is the total number of non-zero elements, and NCE is the average 
number of non-zero elements per column. 



TABLE 2. TRUCK DELIVERY PROBLEM DIMENSIONS 
ROWS COLUMNS NZEL NCE MODEL 



TRUCK 


1 

1 

1 


239 


4752 


30075 


8.0 


SCP | 


TANKER 


1 

1 

1 


166 


7563 


31289 


4.1 


SPP | 



C. THE AIRCREW SCHEDULING PROBLEM 

An airline has a set of m flight legs, each of which requires a crew. 
Given the airline's timetable, a set of n possible crew rotations can be 
generated. Each crew rotation is a sequence of scheduled flight segments 
constituting a roundtrip (a sequence departing from and returning to one 
of the airline's crew bases). A cost is calculated for each rotation, 
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and once a complete set of flyable rotations has been generated, the 
problem is to select an optimal, feasible subset. 

Depending on whether or not crew members are allowed to be passengers 
on certain flights, optimal covers or optimal partitions yield optimal 
schedules. "Deadheading" is the practice of allowing a crew to travel as 
passengers on certain flights. Planned deadheading can be accommodated 
with the partitioning model. If rotation j concludes a planned deadhead 
on flight segment i, then a., is set to zero rather than one. If 
unplanned deadheading is allowed, however, then a covering problem must 
be solved. 

A typical side condition common to these models is the Crew Base 
Constraint of the form 

X H • X • = M for s = 1, ..., S 

jeD s J J 

where H- is the number of flying hours, per month, associated with 

J 

rotation j; M s is the maximum number of flying hours available per 
month at crew base s; and D $ is the set of rotations flown out of crew 
base s. 

All of the problems of this type were provided by Professor Roy E. Marsten, 
University of Arizona, and are described in [Ref. 23]. TIGER1 and TIGER2 
are examples of crew scheduling problems generated for Flying Tiger 
Airlines. AMERICAN is a large crew scheduling problem generated by 
American Airlines. (All of the airline problems in this study were solved 
without crew base constraints.) The last problem in this class is BUS, a 
driver-scheduling problem generated for Helsinki City Transport as 
described in [Ref. 23]. 
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TABLE 3. AIRLINE CREW SCHEDULING PROBLEM DIMENSIONS 





ROWS 


COLUMNS 


NZEL 


NCE 


MODEL 




1 

TIGER1 | 


160 


636 


4134 


6.5 


SPP 




TIGER2 

i 


107 


2188 


8266 


3.8 


SPP 




1 

AMERICAN | 
1 


95 


9318 


57293 


6.1 


SPP 




BUS | 

1 


56 


530 


3365 


6.3 


SPP 





D. THE MAXIMAL SET COVERING PROBLEM 

Either the facilities location problem or the list selection problem 
can be formulated as a Maximal Set Covering Problem. This problem 
differs from the basic SCP because we no longer require that all rows be 
covered, rather the objective is to cover as many rows as possible 
subject to various constraints. To accomplish this, m continuous variables, 
Y., are added to the basic model to produce the following Mixed Integer 
Program (MIP): 

m 



Min £ Y • 





i i = l 


1 






n 

S.T. £ 

j = l 


a ij x j 


1 = 1, 


(9) 


n 

£ 

j=i 


x j ■ j 




(10) 


n 

£ 

j=i 


CjXj < B 






0 £ 


Ij <1 ' ■ 1, 


. . . , m 




X j G 


tO,l> j - 1, 


• • • ^ n • 
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The constraint (9) limits the number of rows which may be covered by 
specifying that only J columns can be used. The constraint (10) is a 
budget constraint which specifies that as many rows as possible be 
covered for B dollars. The sense of the objective function here is to 
minimize the number of rows left uncovered. 

Another formulation in the same spirit replaces the objective function 

by the familiar Min £ C . X - , and adds the constraint 

j J J 

m 

(11) £ Y < M. 

i=l 1 - 

This formulation seeks the minimum cost set of columns which leaves at 
most M rows uncovered. 

Dwyer and Evans [Ref. 30] have applied a similar formulation to the 
"list selection problem." The list selection problem selects a set of 
subscriber lists which maximizes the proportion of customers reachable 
with direct mail pieces. The rows correspond to magazine subscribers, 
and the columns to individual magazines. Let a-j = 1 if individual i 
subscribes to magazine j, and zero otherwise. 

Moore and Revel le [Ref. 28] have applied this formulation to a 
hierarchical facilities location problem. The rows represent demand 
points and the columns represent various location strategies. The objective 
is to pick those strategies which cover as much of the demand as possible. 

STEINER1 and STEINER2 are two computationally difficult set covering 
problems published by Fulkerson, Nemhauser, and Trotter [Ref. 53]. Each 
row has exactly three non-zero elements, b i = 1 for all i, and the 
objective function is of the unit-cost type. These basic SCP's were 
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extended to MC0VER1 and MC0VER2 respectively, in order to evaluate the 
difficulty of the Maximal Covering Problem. 



TABLE 4. MAXIMAL COVERING PROBLEM DIMENSIONS 





ROWS 


COLUMNS 


NZEL 


NCE 


MODEL 


1 

STEINER1 | 
1 


117 


27 


351 


13 


SCP 


MC0VER1 | 
1 


118 


144 


495 


13 


SCP(ext) 


STEINER2 | 


330 


45 


990 


22 


SCP 


MC0VER2 | 
1 


331 


375 


1365 


22 


SCP (ext) 



Only the budget constraint (10) was added to produce the extended 
problems. The value of B was set so that MC0VER1 seeks the same 
optimal solution as STEINER1, and MC0VER2 seeks the same optimal solution 
as STEINER2 . 
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IV. COMPUTATIONAL DIFFICULTIES 



A. THE BRANCH AND BOUND ENUMERATION METHOD 

It is evident that the SCP/SPP is a powerful model with many useful 
applications. Unfortunately, it is also true that the large-scale 
SCP/SPP is difficult to solve to optimality. In fact, Karp [Ref. 54] has 
shown the set covering problem to be an NP-hard combinatorial problem. 

The solution techniques investigated here involve simplex-based enumeration, 
often called branch and bound. 

Branch and bound is an enumerative method that has been used success- 
fully to optimize a variety of combinatorial problems. The basic prin- 
ciple is to methodically search the set of possible integer solutions 
in such a way that not all possibilities need be explicitly considered. 

The theoretical framework for this study is provided in the following, 
which has been adapted from Geoffrion and Marsten [Ref. 10]. The procedure 
of branch and bound is described in terms of three concepts: separation, 

relaxation, and fathoming. 

1. Separation 

For any optimization problem (P), let F(P) denote its set of 
feasible solutions. Problem (P) is said to be separated into subproblems 
if the following conditions hold: 

1. Every feasible solution of (P) is a feasible solution of exactly 
one of the subproblems. 

2. A feasible solution of any of the subproblems is a solution 
of (P). 
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The procedure is to first make a reasonable effort to solve (P). 

If this effort is unsuccessful, separate (P) into two subproblems, 
thereby initiating what will be called a candidate list of subproblems. 

A reasonable representation of the candidate list may be an "enumeration 
tree" which reveals the partial ordering of consideration among candidates 
in the list. Extract one of the subproblems from the list and call it 
the current candidate problem (CP). If (CP) can be solved, extract a new 
candidate problem from the list; otherwise, separate (CP) and add its 
"descendants" to the candidate list. Continue in this fashion until the 
candidate list is exhausted (i.e., every branch of the enumeration tree 
has been examined). If we refer to the best solution found so far to any 
candidate problem as the current incumbent, then the final incumbent must 
obviously be an optimal solution of (P). 

The technique of separation involves "branching" on a single 
integer variable. For the SPP/SCP where X. is declared to be a binary 

J 

variable, the IIP can be separated into two subproblems by means of the 
mutually exclusive and exhaustive restrictions X. = 0 or X. = 1. An 

J J 

enumeration tree may be visualized with a vertex associated with each 
separation and an edge with each restriction. The tree predecessor 
relationship among vertices reveals the ordering among separations and 
their associated restrictions. This enumeration tree provides a visually 
appealing illustration of the solution sequence. 

2. Rel axation 

Any constrained optimization problem (P) can be "relaxed" by 
loosening its constraints, resulting in a new problem (P^). By far the 
most popular type of relaxation for the ILP is to replace the integrality 
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restriction on the variables of (P) by simple bounds on the variables, 
producing the continuous problem (P^). The only requirement for (P^) 
to be a valid relaxation is that F(P) cF(P^). For the minimization 
problem, this definition implies: 

1* If (Pr) has no feasible solutions, then the same is true of (P). 

2. The minimal value of (P) is no less than the minimal value of (P^). 

3. If an optimal solution of (P ) is feasible in (P), then it is an 
optimal solution of (P). K 

In selecting between the alternative types of relaxation for a 
given problem, there are two main criteria to be considered. First, it 
is desirable for the relaxed problem to be significantly easier to solve 
than the original. Second, one would like (P^) to yield an optimal 
solution of (P), or, failing that, the minimal value of (P^) should be 
as close as possible to that of (P). The distance between the minimal 
values of (P^) and (P) is often described as the "gap," and is used as 
a measure of the "strength" (small gap) or "weakness" (large gap) of the 
relaxation (P^). Unfortunately, the objectives that (P^) be both 
"strong" and easy to solve are antagonistic. In general, the easier (P^) 
is to solve, the greater the "gap" is between the original and relaxed 
problems . 

3. Fathoming 

Let (CP) be a typical candidate problem arising from the attempt 
to solve (P). The ultimate objective in dealing with (CP) is to determine 
whether its feasible region F(CP) may possibly contain an optimal solution 
of (P), and if it does, to find it. If it can be ascertained by some 
means that F(CP) cannot contain a feasible solution better than the 
incumbent (the best feasible solution yet found), this is certainly 
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good enough to dismiss (CP) from further consideration, and we say that 
(CP) has been fathomed. If an optimal solution of (CP) can actually be 
found, we also say that (CP) has been fathomed. In either case, the 
candidate problem has been entirely resolved for purposes of enumeration, 
and no further separations of (CP) are necessary. Thus, the subproblems 
which would arise as restricted descendants of (CP) have been enumerated 
implicitly by either the bounding argument or the feasibility argument. 

Candidate problem (CP) is fathomed if any one of these criteria 
is satisfied: 

1. An analysis of (CP^) reveals that (CP) has no feasible solution. 

2. An analysis of (CP^) reveals that (CP) has no feasible solution 

better than the incumbent. 

3. An analysis of (CP^) reveals an optimal solution of (CP); e.g., an 

optimal solution of (CP^) is found which happens to be feasible in (CP). 

4. General Tree Search Procedure 

STEP 1: Initialize the candidate list with the ILP. Set the incumbent 

value, Z*, equal to infinity. 

STEP 2: STOP if the candidate list is empty. If there exists an incumbent 
then it must be optimal in the ILP, otherwise ILP has no feasible 
solution. 

STEP 3: Select a candidate problem (CP) from the list and solve its 

relaxation (CP^). 

STEP 4: Fathoming Criterion 1. If the outcome of STEP 3 reveals (CP) 

to be infeasible, go to STEP 2. 

STEP 5: Fathoming Criterion 2. If the outcome of STEP 3 reveals (CP) 

has no feasible solution better than the incumbent, Z*, go to 
STEP 2. 

STEP 6: Fathoming Criterion 3. If the outcome of STEP 3 reveals an 

optimal solution of (CP), go to STEP 3. 

STEP 7: Separate (CP) and add its descendants to the candidate list. 

Go to STEP 2. 
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STEP 8: A feasible solution of IIP has been found. If the value of the 

(CP) is less than Z*, record this solution as the new incumbent 
and set Z* = value of (CP). Go to STEP 2. 

The degrees of freedom in STEP 3 provide a host of options. 

Critical among these is the selection mechanism for branch variables. A 
good branching strategy makes it possible to avoid searching large 
portions of the enumeration tree, thus greatly reducing time spent in the 
enumeration process. STEP 3 can also be prohibitively expensive if the 
solution of (CP R ) is not easy to generate from the solution of (CP). 

This step requires either storage of many (CP) solutions or a restriction 
in the sequence for branch solutions. For instance, "fixed order enumera- 
tion" permits branching only on the last element in the candidate list 
previously associated with a (CP). 

Most successful implementations of this general scheme use the 
solution of the IP relaxation of the IIP to obtain the bounds required 
for the branch and bound enumeration. Christofides [Ref. 2] and Marsten 
[Ref. 1] report that most of the current large-scale algorithms use IP to 
obtain bounds for their various enumeration procedures. Exceptions are 
Etcheberry [Ref. 55], who uses "Lagrangian Relaxation" to obtain the bounds, 
and Glover and Mulvey [Ref. 3], who reformulate and use Generalized 
Network Relaxations. 

The success of the branch and bound scheme depends on good 
branching strategies and the ability to obtain good bounds efficiently 
during the tree search. Typically, many IP's must be solved and even 
though the integer requirement is relaxed for each IP restriction, there 
are two serious problems associated with these IP's which make them hard 
to solve: numerical instability and degeneracy. 
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B. NUMERICAL INSTABILITY 



The concept of a basis for the linear program must first be discussed 
in order to explain the computational difficulties. Consider the system 
of equalities AX = b where X is an n-vector, b an m-vector, and A is an 
m x n matrix {m <_n). From the n columns of A, we select a set of m 
linearly independent columns and denote the m x m matrix determined by 
these columns by B. The matrix B is then non-singular and we may uniquely 
solve the equations BX g = b for the m-vector X g , namely, X g = B _1 b. 

If all n - m components of X not associated with columns of B are set 
to equal zero, the solution to the resulting set of equations is said to 
be a basic solution to AX = b with respect to the basis B. 8 is called a 
basis since its m linearly independent columns span the space E m . 

It can be seen from the above explanation that the transformation by 
the basis inverse is necessary to obtain a basic solution. It is also 
true that all large-scale LP systems available today require some form of 
representation of this basis inverse transformation in order to function 
efficiently. One popular representation is the Product Form of the 
Inverse described by Orchard-Hays [Ref. 56]. Another example is the 
explicit sub-kernel representation described by Graves [Ref. 57], 

Unfortunately, the columns of the SPP/SCP are often nearly linearly 
dependent. For instance, a route generator will produce a base route to, 
say, five locations. By substituting alternate locations one at a time 
into the base route, many routes are generated which differ by only one 
or two elements. This can produce an ill-conditioned basis whose inverse 
can contain numbers so large, or so small that after a few iterations 
with real arithmetic, the computer is unable to maintain sufficient 
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significance to provide the numerical stability necessary for the LP 
algorithm to converge, or if it does converge, to produce the true 
optimum. 

To attempt to overcome the numerical instability, it is necessary to 
"clean up" the representation of the inverse by a process known as 
reinversion. There are many different reinversion schemes available, but 
in essence, they all use the original problem data to generate a new 
representation of the inverse which is relatively free from accumulated 
round-off error. Reinversion is computationally expensive, and for the 
SPP/SCP it is often necessary to reinvert quite frequently, thus slowing 
the computation of the bounds needed by the enumeration scheme. 

C. DEGENERACY 

The primal simplex algorithm for the solution of the LP proceeds from 
one basic feasible solution of the constraint set of a problem to another 
in such a way as to continually decrease the value of the objective 
function until a minimum is reached. If one or more of the basic variables 
in a basic solution has value zero, that solution is said to be a deqenerate 
basic solution. For the SPP/SCP, it is important to note that seeking 
the minimum number of columns capable of covering or partitioning the row 
set is exactly equivalent to maximizing the primal degeneracy present in 
the optimal basic solution. 

"Pivoting" is the name applied to the procedure which accomplishes a 
basis exchange. A degenerate basis exchange is one in which a column 
leaves the basis, a new column enters the basis, and the value of the 
objective function does not change. A degenerate pivot, then, exhibits 
the undesirable property that the basis exchange uses up computation time. 
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but no overt improvement in the objective function is realized. (We will 
ignore for the moment that we face a serious theoretical dilemma in 
demonstrating that the simplex method is finite in the presence of 
degeneracy.) Every pivot involves the update of the basis inverse 
representation; therefore, each update usually introduces round-off 
error. As discussed earlier, the basis inverse transformation for an 
ill-conditioned basis accumulates round-off error very quickly. In the 
presence of massive degeneracy, then, it is possible for the convergence 
of the primal simplex algorithm to be prohibitively slow, because an 
excessive amount of time is spent making degenerate basis exchanges and 
performing reinversion. 

Degeneracy and round-off error can also produce a very serious 
phenomenon called "cycling." It is possible that a repeating sequence of 
degenerate basic solutions will be generated such that the simplex 
algorithm cycles endlessly without making progress. Most LP systems 
ignore the threat of cycling, because the repeating sequence is usually 
broken after reinversion "randomly" permutes the row order, thus evoking 
a new solution trajectory. If, however, significant time is spent in a 
cycle while waiting for reinversion to be triggered by the pivot count, 
the internal clock, or by a check on the rounding error, rapid solution 
of the LP will not be possible. 

It has been determined that degeneracy and consequent cycling are 
significant obstacles for the efficient solution of the LP relaxation of 
the SPP/SCP with most of the available LP systems. Massive primal 
degeneracy is present as a consequence of the binary coefficients and the 
fact that for most SPP/SCP's, the right-hand sides of each row are 
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identical. It is this massive primal degeneracy which led Marsten to 
suggest the use of a dual algorithm for the solution of the IP [Ref. 1], 
For the unit-cost objective function, a similar dual degeneracy can also 
be present. Although in most problems with general costs, the dual 
degeneracy is less severe than the primal degeneracy, even objective 
functions with varying costs can produce an effective degeneracy due to 
round-off error. The problem called TRUCK exhibits these characteristics. 
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V. REFORMULATIONS 



A. THE NEED FOR REFORMULATION 

The LP relaxation of the SPP/SCP can be numerically troublesome. 

One way to avoid this difficulty is to seek another relaxation which may 
be easier to solve. The alternate relaxations examined here are based on 
networks. Reformulation of the SPP/SCP as a network makes it possible to 
exploit an efficient solution technology. Network codes such as GNET 
[Ref. 58], and GENNET [Ref. 59] use basis handling procedures which 
require very little real arithmetic, thus avoiding much of the round-off 
error problem. Reformulation comes at the cost of making the problem 
larger, but it is hoped that the superior speed and numerical stability 
of the network approach will more than make up for the increase in 
problem size. 

B. THE FIRST GENERALIZED NETWORK REFORMULATION 

Glover, Hultz, Klingman, and Stutz [Ref. 60] have offered an interesting 
reformulation of any all-binary integer problem as an integer generalized 
network. The Generalized Network formulation is attractive because of 
the emergence of some very fast computer codes for solving generalized 
networks. Glover, et al . , report that their code, NetG, is up to 50 times 
faster than state-of-the-art commercial LP codes on continuous network 
problems. GENNET has proved to be comparable in solution speed for the 
same class of problems. This reformulation, then, (which we will call 
GNIP-1) offers some promise for the solution of the SPP/SCP. It also 
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provides a way of describing the SPP/SCP in network terms which can make 
it easier to formulate and explain the model. 

The SPP 
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The procedure for drawing the network flow diagram is given below. 

Given a SPP(SCP) with m rows, n columns, and NCE(j) non-zero elements 
per column. 
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1. Create a node i for each constraint, i = 1, m; and give each 
node a demand of 1 (SCP >_ b .). 

2. Create a node j for each variable, j = 1, ..., n; where demand = 
supply = 0. 

3. Create a super source node S and give it a supply > 0. 

4. Create a generalized arc (S,j) for each original variable, 
k = 1 , ...» n . 

a. Assign arc a cost of Cj. 

b. Designate arc X k as an integer (0 - 1) arc. 

c. Give arc X^ a multiplier equal to NCE(j). 

5. Create a pure network arc (j,i) for each non-zero element in 
column j. 

a. Assign arc Y^ a cost of zero. 

b. Assign arc Y^ an upper bound of one and a lower bound of zero. 

The GNI P-1 reformulation of the Air Freight Example Problem is displayed 

in Figure 1. 

It can be seen in Figure 1 that if the flow on a generalized arc X^^ . 

is zero, the flows on the continuous arcs Y^ emanating from the variable 

node j are also zero. It is also clear that if the flow on the generalized 

arc is 1, a flow of arrives at the variable node, forcing a flow of 

1 on each continuous arc incident to that node. 

The telling disadvantage of the GN reformulation is the weakness of 

its continuous relaxation. When the integer restriction is removed for 

the arcs X^, there is no assurance that an integral flow will arrive at 

the variable node. So far, this is comparable to the LP relaxation of 

the integer variable X.. The difference between the LP relaxation and 

J 

GN relaxation lies in the continuous arcs Y^ emanating from the variable 
node. Given a fractional supply, there is no assurance that the flows on 
each continuous arc will be the same. If the flows were identical, the 
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ABC PARAMETERS 




VARIABLE NODES (j) CONSTRAINT MODES (i) 



Figure 1. GNIP-1 Reformulation of the Air Freight Example 
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GN relaxation would be the same as the LP relaxation. Unfortunately, 
empirical evidence suggests that the flows differ widely. Furthermore, 
the problem increases as a function of the number of non-zero elements 
per column in the original ILP. Srinivasan and Thompson [Ref. 61] report 
a practical upper bound of three or four non-zeros per column for a 
similar reformulation of set partitioning problems. 

Glover and Mulvey [Ref. 3] state that it is legitimate to manipulate 
the costs incident to a given variable node provided that these costs 
always sum to Cj. This can be interpreted as a form of "Lagrangian" 
manipulation, taking side constraints into the objective function, where 
these side constraints stipulate that the flow on each pure network arc 
incident to each variable node be the same. By linear programming 
duality, there exists some such assignment of costs for which the optimum 
objective function value for the GN is the same as that for the LP 
relaxation. Obviously, the trick is to find an exact or heuristic 
procedure of assigning these costs to "strengthen" the GN relaxation. 
Several attempts were made to distribute costs according to the proportion 
of flows on the first and subsequent GN relaxations, but these attempts 
proved ineffective. 

C. THE SECOND GENERALIZED NETWORK REFORMULATION 

One way to strengthen the GN relaxation is to reduce the number 
of continuous arcs in the reformulation. Glover and Mulvey [Ref. 3] 
have presented another GN formulation for the ILP (GNIP-2) which elimin- 
ates the super source node, the n generalized arcs emanating from it, and 
one continuous arc per variable node. For the ILP with m rows, n columns, 
and NCE(j) non-zero elements per column. 
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Create a node i for each constraint, i = 1, m; and give each 
a supply of one (SCP >_ b^). 

2. Create a node j for each variable, j = 1, ..., n; where demand = 
supply = 0. 

3. Create an arc (i,j) for each non-zero element in the ILP, connecting 
each constraint node to the appropriate variable node. 

a. Select one arc for each j and designate it as an integer (0-1) 
general ized arc X^. 

(1) Assign arc X. a cost of C^. 

(2) Assign arc x£ a multiplier of M k = NCE(j) - 1 

(if NCE is greater than one). 

b. Designate the remaining arcs as continuous generalized arcs Y^. 

(1) Assign arc Y k an upper bound of 1. 

(2) Assign arc Y k a cost of zero. 

(3) Assign arc Y k a multiplier of M k = -1. 

c. If NCE(j) = 1 

(1) Create a slack node S with a supply <_ M. 

(2) Create a continuous arc Y^ (S,j) as in 3b above. 

The GNIP-2 reformulation of the Air Freight Example is displayed in 
Figure 2. 

The above procedure produces the following mathematical program GNIP-2 
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SUPPLY 




Figure 2. GNIP-2 Reformulation of the Air Freight Example 
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Table 5 shows the relative strengths of the GN and LP relaxations, 
plus relevant problem dimensions for a few of the test problems. The 
column labeled TIME OF LP/GN displays the time required to obtain the 
first solution of the LP/GN relaxations, given in IBM 3033 CPU seconds. 

A special version of GENNET [Ref. 59] was used to obtain the GN relaxations 
and the current version of the X System [Ref. 62] was used for the LP 
relaxations. The GN IP-1 relaxations for a few problems were obtained 
with the X System so comparative times will not be given. %0PT is the 
value of the continuous relaxation multiplied by 100 and then divided by 
the value of the optimal integer solution. The closer %0PT is to 100, 
the better the relaxation. 

STEINER1A and STEINER2A are two SCP's created by transposing the 
coefficient matrices of STEINER1 and STEINER2 problems. These problems 
were constructed because no real problems were available having fewer 
than four non-zero elements per column. The GN relaxations have been 
reported to be non-competitive with the LP relaxations when the number of 
non-zeros per column exceeds three or four [Refs. 61]. The results from 
this study indicate that the strength of the GN relaxation is unpredictable 
The GN relaxations for the symmetrical STEINER problems are very competi- 
tive with the LP relaxations. The GN relaxation for BUS, however, was 
unexpectedly bad. The gap for 3US is so great that there is little hooe 
of a reasonable solution trajectory in the enumeration phase. Even the 
relatively easy problems, TIGER1 and TIGER2, produced GN relaxations of 
poor quality. 
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TABLE 5. 



GENERALIZED NETWORK RELAXATIONS FOR SELECTED PROBLEMS 



MODEL 



ROWS/ COLS/ AVG TIME OF * nPT 
NODES ARCS NCE LP/GN * UK 1 
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D. THE GENERALIZED PROCESS NETWORK REFORMULATION 

Some recent work by Koene [Ref. 63] on Processing Networks offers an 
interesting perspective for improving the GNIP formulation. A processing 
network is one for which the flows on arcs going out from (or into) a 
given node are proportional to each other. To achieve this proportional ity 
(in our case, we desire equal flows), a network with side constraints 
must be solved. Several authors have reported some success with solving 
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pure and generalized networks with a "few" complicating constraints 
[Refs. 64, 65, 66]. Unfortunately, the number of side constraints in 
this case grows again with the number of non-zero elements in each column 
of the HP. In fact, so many side constraints are needed in these 
problems that the size of the basis inverse representation which must be 
carried along with the network becomes prohibitive. 

Because the side constraint portion is so large, it was decided to 
view the generalized process network problem as a candidate for either 
generalized upper bounding factorization or network factorization routines 
embedded within an LP system. Generalized Upper Bounding (GUB) refers to 
a set of rows with at most one non-zero in each column. The coefficients 
of each non-zero must be +1 or -1 (or capable of being scaled to +1, or -1). 
Network factorization refers to a set of rows with at most two non-zeros 
in each column, and the non-zeros may be of any value or sign. Brown and 
Wright [Ref. 67] have examined techniques for extracting network structures 
embedded in general LP problems, but the test bed for network factorization 
routines coupled with an Integer Programming System is not yet in place, 
so no results can be reported at this time. A Generalized Upper Bounding 
factorization routine was available, however, and a formulation of the 
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(17) - Y ^ , =0, i = l, ...,n;ks*k'; 

k : tai 1 i k 1 : t ai 1 i 

where \:head j = 1/ M k:head j • 

(14) is the same as for the GN I P-2 . (15) forms a GUB set, the right-hand 

side of (16) would be replaced by _> b. for the SCP, and (17) is the 
side constraint section. 

The number of GUB rows attainable is equal to the number of variable 
nodes in the GNIP. This is an enormous GUB set, but even with the GUB 
rows not considered, the problem is still larger than the original ILP. 

It was predicted that the basis inverse representation obtained from this 
factorization would not be as ill-conditioned as the representation 
obtained from the normal LP bases. However, the LP's with GUB factoriza- 
tion are also very difficult to solve and the results do not indicate 
that this promises to be a competitive approach. 
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VI. COMPUTATIONAL EXPERIENCE 



A. THE COMPUTER 'CODE 

The computer code used for this research is a large-scale optimization 
test bed called the X System or simply XS. XS has been developed since 
1974 [Ref. 62] as a general-purpose optimization system of advanced 
design which serves both as a prototype test bed for research and as the 
fundamental computational foundation of many application packages utilizing 
optimization. XS is designed to solve large-scale optimization problems, 
with special emphasis on mixed integer models. The embedded linear 
programming module has received most of the design effort and exhibits 
many singular features including: 

1. Hyper-sparse data representation [Ref. 68]; 

2. Complete constructive degeneracy resolution [Ref. 57]; 

3. Basis factorization [Ref. 69]; and 

4. Elastic range constraints [Ref. 62]. 

XS consists completely of open FORTRAN subroutines. FORTRAN IV H 
(Extended) OPTIMIZE (2) is the implementation dialect and an IBM 3033 is 
the host computer. All of the problems with the exception of AMERICAN 
were solved interactively under the IBM CMS timesharing system. 

1. Elastic Model 

XS requires that the model be thought of in an extended or 
"elastic" sense. The term elastic comes from the view that no constraint 

ft 

is totally binding, but may be violated at a price, an elastic penalty. 

The feasible region is thereby "stretched" to the degree of elasticity 
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specified by the penalty structure. The extended elastic model appears 
below. 
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Cost Coefficients, 

Constraint Coefficients, 

Lower and Upper Constraint Violation Penalties, 

Lower and Upper Constraint Range Limits, 

Logical "artificial" and "surplus" variables. 
Variables (any of which may be integer), 

Lower and Upper Variable Bounds, 

Is 0, or +1, or -1 (GUB indicators). 

Number of GUB rows. 

Row Dimension, 

Column Dimension. 

2. Hypersparse Data Representation 

Appendix 3 exhibits a specimen of the data inDut format used for 
this research. This particular form exploits the hypersparse data 
structure capaoility of XS since there are very few unique real numbers 
in a SPP/SCP. All of the constraint coefficients are 1 and 0; therefore, 
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it is not necessary to store a real number for each non-zero coefficient, 
but merely its address. Further efficiencies can be realized in a similar 
manner for the unit-cost objective function, and for the right-hand side 
for which b i = 1 for all i . 

3. Primal Dual Algorithm and Degeneracy Resolution 

The representation of the Basis Inverse maintained by XS admits 
the application of the Primal or Dual Algorithm with equal facility. 

This fact makes it possible to use the Dual Algorithm for this problem 
class with no loss of performance with respect to the Primal. As mentioned 
earlier, the Dual is the algorithm of choice because of the massive 
primal degeneracy present in this class of problems. 

It is this equal facility between the Primal and the Dual which 
provides the framework for the degeneracy resolution machinery in XS. In 
Graves' terminology [Ref. 57], when degeneracy is encountered, the 
algorithm is said to be "blocked." The resolution of blocking in either 
the primal or dual algorithm is accomplished by shifting to the alternate 
algorithm when blocking occurs. The alternate algorithm is applied to a 
subproblem of the original problem and at worst we are led to a contracting 
sequence of problems to which we alternately apply the primal and dual 
algorithms. A strict contraction can be assured, and thus in at most a 
finite number of steps, resolution is assured. A complete illustration 
of blocking resolution can be found in [Ref. 70]. 

The degree to which blocking is resolved through this sequential 
nesting of subproblems is controlled by a blocking resolution parameter. 
This parameter can be set so that any degree between no resolution and 
total resolution can be obtained. The parameter also controls the point 
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at which blocking resolution begins in the solution trajectory. This 
means that blocking resolution can be inhibited early in the solution 
trajectory when it may not be efficient to resolve every degeneracy, and 
then enabled as the trajectory nears optimality. 

B. HEURISTICS WITHIN THE EXACT ALGORITHM 

The efficiency of branch and bound algorithms can be improved through 
judicious use of heuristic information that indicates good branching 
strategies to follow. If good feasible solutions (incumbents) can be 
obtained early, then fathoming can occur more quickly and more frequently 
in the search. Also, premature termination of the algorithm will more 
often result in near-optimal or optimal solutions. 

1. The Elastic Heuristic 

A robust technique for obtaining an incumbent has been incorporated 
into the XS enumeration system. Any continuous relaxation of the Elastic 
ILP can be rounded to an integer solution with very little computational 
effort. Further, all such rounded solutions are admissable (feasible in 
the extended elastic sense). The current continuous solution is rounded 
in three passes, each of which selects variables from a class defined in 
terms of 9 , where 0 £ 9 £ .5 and (1 - 9 ) £ X - £ 1 or 0 < X. < 9 . 

J J 

Class 1: Nearly integral (0 £ 9 £ .2). 

Class 2: Fractional (.2 £ e £ .4). 

Class 3: Ambivalent (.4 <_ 9 £-.5). 

The rounding heuristic sequentially exhausts variables from each class 
and rounds using a "minimal regret function," rounding away from the 
worst penalty. 
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In the default elastic enumeration scheme, there are only depth 
and value-motivated fathoming rules. Feasibility plays no direct role 
in the enumeration except via the accumulated cost of the penalties in 
the objective function. Integer solutions and lower bounds of excellent 
quality are empirically produced quite early in the enumeration effort, 
permitting routine early termination of the search based on an optimality 
tolerance or on a maximum depth (permissible number of fixed variables in 
any restriction). Tuning of the method is easily accomplished via these 
two limits and the elastic penalties used to express the underlying model. 

The penalty structure found most effective with this heuristic is 
based upon the number of non-zero row elements in each row of the constraint 
matrix, called NRE(i). It has also been determined that the upper penalty 
P + should be set at one-half the value of the lower penalty P“, in 
order to coerce the heuristic to round up. This forces shallow termina- 
tion with respect to depth fathoming. A penalty constant, P, is set at 
approximately one order of magnitude greater than the largest cost 
coefficient, so that for each row in the SPP/SCP 

Set PT = P/NRE ( i ) . 

Set p + = 1 p7 
e i 2 i* 

Penalties set in this manner "communicate" to the heuristic 
that rows which can be covered in only a few ways are mere important than 
rows with a higher row count. In the enumeration, then, these "important" 
rows are satisfied first, making it possible to avoid many of the alternate 
possibilities available for covering rows with a large row count. 

Table 6 exhibits the computational characteristics of the Elastic 
Method. IP and IPtime indicate the solution value and solution time (in 
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CPU seconds) for the first continuous relaxation of the indicated problems. 
OPT is the optimal integer solution, and ILPtime is the total CPU time in 
seconds required to solve the ILP to the optimum. INPUT/OUTPUT time is 
included in the ILPtime values. Input time includes the time to read in 
the entire problem and accomplish error checking. Output time includes 
the printing of intermediate information plus the time required to 
execute the report writer. The dual algorithm was used for all solutions. 

TABLE 6. COMPUTATIONAL RESULTS FOR THE ELASTIC METHOD 





LP 


LPtime 


OPT 


ILPtime 


STEINER1 


9.0 


0.08 


18 


25.13 


MC0VER1 


0.0 (9) 


0.05 


0 (18) 


0.39 


STEINER2 


15.0 


0.52 


30 


527.08 


MC0VER2 


0.0 (15) 


0.22 


0 (30) 


417.08 


TIGER1 


56406.0 


0.94 


56406 


0.97 


TIGER2 


15098.0 


9.46 


15098 


9.53 


TIGER2a 


15684.0 


10.23 


15684 


10.30 


BUS 


50754.5 


8.18 


61308 


177.44 


AMERICAN 


No LP solution 


after 30 


minutes CPU time 




TRUCK 


No LP solution 


after 30 


minutes CPU time 




TANKER 


75941.4 


34.05 


75941.4 


44.62 


It is 


interesting to note 


that the 


maximal set covering reformul 



tions of the two STEINER problems as MC0VER1 and MC0VER2 produced easier 
ILP'S. The budget constraints for these oroblems were constructed so 
that the maximal covering problems would seek the same optima as the SCP 
formulations. TIGER2a is a variant of TIGER2 obtained by eliminating 
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some of the columns found in the optimal solution of TIGER2 . Marsten has 
indicated in [Ref. 23] that problems in this general class typically have 
many nearly optimal integer solutions. TIGER2a supports this observation: 

8 of the 33 optimal columns for TIGER2 were removed to construct TIGER2a, 
and the solution is only 3.9 percent worse. 

C. THE ELASTIC METHOD WITH STARTING SOLUTIONS 

As indicated in Table 6, the LP relaxations for AMERICAN and TRUCK 
were not solved within 30 minutes. After many unsuccessful attempts to 
overcome the numerical instabilities peculiar to these LP's, various 
combinatorial and heuristic methods for obtaining starting solutions were 
considered. Given a feasible, suboptimal solution of "reasonable" quality, 
it should be possible for the elastic method to find the optimal solution 
in few enough iterations to avoid many of the numerical difficulties. 

D. LOGICAL REDUCTION 

Garfinkel and Nemhauser [Ref. 71] have given a set of simple rules 
for logically reducing a problem matrix for the SPP/SCP with b. = 1 for 
all i. Although logical reduction is not guaranteed to provide a starting 
solution, substantial reductions in problem size can greatly improve the 
numerical behavior of many problems, especially if the problems were 
originally generated with many inherent redundancies. This explanation 
of logical reduction provides insight into the structure of the SPP/SCP 
and is valuable for understanding and evaluating some of the heuristics 
used for constructing starting solutions. 

Not all of the rules were chosen for implementation because it is 
felt that their inclusion does not return sufficient reduction to justify 
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the computational expense of using them. The implementation scheme used 
is a non-backtracking (polynomial time) routine involving easy binary 
comparisons of rows and columns. If significant reduction is achieved 
after one application, the scheme is applied iteratively until no more 
improvement is obtained. Those rules which were implemented are based on 
the notions of row and column dominance. For example, if two columns are 
equal element-by-element (ColA = ColB), and one has a cheaper cost, then 
the more expensive column is dominated by the cheaper and may be deleted. 
Not so obviously, if RowA is wholly contained as a subset of RowB 
(RowA _< RowB), then RowB may be deleted, since any column which covers 
RowA will also cover RowB. The four rules used are: 

Rule 1: Delete all null columns and null rows. 

Rule 2: Column Dominance. 

A. SCP : If ColA > ColB, and CostA _< CostB, delete ColB. 

B. SPP: If ColA = ColB, delete the more expensive column. 

Rule 3: Row Singleton. 

A. Delete the row covered by only one column. 

B. Fix the variable associated with the singleton to one. 

C. Delete all rows covered by the fixed variable. 

D. SPP: Delete all columns in the rows deleted by 3C. 

Rule 4: Row Dominance. 

A. If RowA _> RowB, delete RowA. 

B. SPP: If RowA is deleted, also delete every column in RowA 

which is not included in RowB. 

The column and row reduction schemes can be applied iteratively, since 
after the first application, additional dominance may be discovered. 
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Consider the constraint matrix of the Air Freight Example below. 
Application of the reduction rules would achieve the following results 



Los Angeles 
San Francisco 
San Jose 
Denver 
Port! and 
Seattle 
San Diego 

Costs 



R1 R2 R3 R4 R5 R6 R7 



1 


0 


0 


0 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


1 


1 


1 



0 0 0 0 6 7 4 



For the SPP, 

Rule 3: A. Delete Los Angeles. 

B. Fix R1 to one and delete it. 

C. Delete San Francisco and San Jose. 

D. Delete R2 and R3. 

Rule 4: A. Denver > Portland, delete Denver 

Seattle > Portland, delete Seattle 
3. Delete R6. 

The resulting reduced problem is 





R4 


R5 


R7 




Port! and 


1 

| 1 
1 


1 


0 


1 

1 

| 


San Diego 


1 o 
1 


1 


1 


| Solution = Rl, R4, R7 
1 


Costs 


1 

1 o 
1 


6 


4 


1 

1 

1 
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The action of deleting a row or column does not necessarily mean 
that the row or column has been eliminated from the problem. In the 
reduced air freight example, the removal of Denver and Seattle from 
consideration means simply that any column which covers Portland will 
automatically cover Denver and Seattle; therefore, Portland is the 
critical row. The same observation holds for San Diego. Table 7 exhibits 
the degree of reduction achieved and the computation times for two of the 
test problems. % RED is derived by dividing the number of rows/columns 
deleted by the number of original rows/columns, respectively. No reduction 
was achieved for STEINER1, STEINER2 , BUS, TIGER2, TANKER, and TRUCK. 

TIME indicates the total time in CPU seconds required to achieve the 
indicated reduction. 

TABLE 7. LOGICAL REDUCTION RESULTS FOR SELECTED PROBLEMS 





ROWS 




% RED 




COLS 




% RED 


ITERATIONS 


TIME 




1 

TIGER1 | 

1 


160 


1 

1 

1 


50 


1 

1 

1 


636 


1 

1 

1 


13.8 


1 1 
1 1 1 
1 1 


27.3 




1 

AMERICAN | 


95 


1 

1 


0 


1 

1 


9318 


1 

1 


47.0 


1 1 1 


1743.0 





The tremendous column reduction achieved on AMERICAN is due to the 
aDsence of crew base constraints. During the original generation of this 
problem, entire sets of columns were replicated and were designed to be 
kept distinct by the mutually exclusive crew base constraints. Unfortu- 
nately, the original crew base constraints are no longer available. Once 
the reduction for AMERICAN is explained, then, the benefits obtainable 
by logical reduction do not appear to justify its computational expense. 
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The reduction routine could be a valuable aid, however, in validating 
the performance of column generation programs, and could also be of use 
in the first screening of problem data. 

E. A GREEDY HEURISTIC ALGORITHM 

Baker [Ref. 72] describes a heuristic algorithm developed to exploit 
the structure of large airline crew scheduling problems formulated as SCP's. 
The approach is to successively augment the solution set for the SCP by 
selecting columns which exhibit the minimum average cost per uncovered row. 

STEP 1: Initialization. Solution set = 0. Row Coverage Set = 0. 

STEP 2: Selection. Choose the column XT that has the minimum 
average cost per uncovered row. 

STEP 3: Update. Add XT to the solution set. Update the row 

J 

coverage set to reflect the rows covered by XT. If all 
rows are covered, STOP. Otherwise GO TO STEP 2. 

The worst case bound for the solution obtained from this procedure is 
reported by Baker to be: 

E 

SOLN(Heuristic) < SOLN(Opt) £ 1/k , 

k=l 

where E is the maximum number of non-zero elements in any column in the 
solution set. This means that for a set of columns with from four to 
ten non-zero elements, the worst solution obtainable from the heuristic 
is from two to three times larger than the value of the optimum. ~ab’e 3 
indicates, however, that the actual performance of the heuristic can be 
much better than the worst case bound. The column labeled START TIME 
indicates the time required to obtain the starting solution. %0PT is 
the percentage difference between the starting solution and the optimal 
solution. This simple heuristic will provide a classically feasible 
solution of reasonable quality for the SCP. The solution for SPP 
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will not be feasible, since the heuristic will treat the SPP as a SCP and 
overcovering of the rows will result. 

TABLE 8. STARTING SOLUTIONS OBTAINED FROM THE BAKER SCP HEURISTIC 

START 

TIME % OPT 



STEINER1 


0.07 


5.6 


STEINER2 


0.06 


6.7 


TRUCK 


8.47 


9.19 



Because these starting solutions are of limited value for set parti- 
tioning problems, this line of study was terminated in favor of a new 
solution technique developed by Brown and Graves [Ref. 73]. The new 
technique uses a block partitioning scheme to exploit the intrinsic 
structure of the SPP/SCP. 

F. BLOCK PARTITIONING ALGORITHM 

Christofides [Ref. 3] describes a block partitioning structure 
attributed to Pierce [Ref. 15] which has been used by many researchers for 
this problem class. To place the SPP/SCP in block form, we make up m 
blocks of columns, one block for each row. Block i will comprise of 
exactly those columns which cover row i, but do not cover rows 1 to i-1. 
This produces a staircase matrix with zeros to the right of the staircase. 
The blocks in general can be arranged in tableau form as shown in Figure 
3, although one or more blocks may be nonexistent in a particular problem. 

Marsten [Ref. 1] determined experimentally that sorting the rows by 
increasing length gave consistently good results for his algorithm which 
favors the shorter rows for early branching. The row with the fewest 
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j Block 1 j Block 2 j Block 3 


Block 4 j ... j Block m 


Row 1 | 11 | 0 

Row 2 | | 111 | 0 


1 1 1 

1 1 1 

0 | 1 1 


Row 3 | | j 1111 


Row 4 | 0 or 1 | j 

| | 0 or 1 | 

. | j | 0 or 1 

Row m | | 


1111 | etc. | | 


1 1 1 

1 1 1 

0 or 1 j j 0 j 

1 1 mu 1 



Figure 3. Block Partition Structure 

l's is placed at the top, and the row with the most l's ends up at 
the bottom. (This ordering by row length is also depicted in Figure 3.) 
Intuitively, it seems reasonable that a row which can be covered in only 
a few ways is more critical than a row which can be covered in many ways, 
and should therefore be dealt with first. This row orderinq scheme was 
chosen for implementation. 

Once the problem has been placed into the block structure, three ways 
of ordering columns within blocks are found in the literature: (1) 

heur istical ly by increasing or decreasing cost [Ref. 3]; (2) lexicographic- 
ally [Ref. 1]; and (3) randomly (i.e., columns are not explicitly reordered 
once blocking has been accomplished). The algorithm developed by Brown 
and Graves does not presently require that columns be specially ordered 
within blocks. 

The Block Partitioning Algorithm can be applied to both the initial 
LP Relaxation and subsequently to the integer enumeration. For the IP, 
the problem is first divided into an arbitrary number of block grouos 
forming a set of distinct LP subproblems. The first LP subproblem (LP 1 ) 
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is solved to optimality, the next LP subproblem (LP 2 ) is dynamically 
appended to the first, and then the two subproblems are solved as one. 

The third LP subproblem is appended to the solution of LP 1 2 and the 
procedure continues until all subproblems have been appended and a global 
solution has been obtained. 

Many variations of this procedure are evident. TRUCK has been solved 
by dual relaxation of the aggregation of successive LP solutions. 
Particular problems exhibit great sensitivity to tuning of this procedure. 
In particular, a few complicating columns are frequently the principal 
cause of computational difficulty. 

Table 9 presents results for the Block Partitioning Algorithm applied 
to the LP only. Subsequent to the solution of the global LP, the elastic 
enumeration scheme was used to obtain optimality. OPT TIME indicates the 
total time required to achieve optimality. I/O time is included in the 
values. All times are in IBM 3033 CPU seconds. 

TABLE 9. RESULTS FOR THE BLOCK PARTITIONING ALGORITHM 





NUMBER OF 
BLOCKS 


BLOCK 

TIME 


NUMBER OF LP 
SUBPROBLEMS 


GLOBAL 
LP Time 


OPT 

TIME 


BUS 


48 


0.001 


4 


10.76 


174.50 


TIGER2 


106 


0.04 


4 


10.98 


15.98 


TRUCK 


194 


0.15 


4 


210.92* 


373.72* 


TANKER 


50 


0.11 


4 


9.14 


22.56 


AMERICAN 


79 


0.24 


4 


132.52 


536.41 



*Primal Feasible, Suboptimal solutions. 
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’’’he analogous procedure whereby each subproblem is solved to an 
integer optimum did not compare favorably with the default elastic 
method. This line of study was therefore terminated and another concept 
called the "Refinement Procedure" was investigated. 

G. COLUMN GENERATION AND PROBLEM REFINEMENT 

There is no substitute for possessing the column generating program 
when attempting to solve the large-scale SPP/SCP. Attempting to solve 
large, static problems in a vacuum is doomed to be either exDensive or 
impossible. The column generator and the optimization system work best on 
these problems when they are intimately coupled so that each module can 
communicate with the other. In this way, the optimizer works on sma^e^ 
problems and the column generator produces only those columns which can 
contribute to a better solution. 

Graves [Ref. 73] has developed a refinement procedure for the SPP 
which attempts to capture, for a static problem, some of the caoabilities 
which are present when the column generator is in hand. This procedure 
results in a relaxation of the original problem, but it is a workable 
scheme which can produce acceptable solutions. The procedure is 
implemented as follows: 

STEP 1: Solve the SPP as a SCP. Identify rows with multiple covers. If 

no multiple covers exist, STOP. 

STEP 2: For each column covering a row which is multioly covered, 

generate a new column which does not include rows with multiple 
covers. Original columns are assigned a "cost per row covered" 
which is used to give new columns reduced costs proDortionate to 
the number of rows deleted. Go to STEP 1. 

Table 10 displays the performance of the refinement procedure on two of 

the more difficult SPP's, 3US and AMERICAN. Tie refinement procedure is 
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used in conjunction with the Block Partitioning Algorithm set for four LP 
subproblems. # REFINEMENTS gives the number of refinement iterations, # 
COLUMNS GENERATED gives the total number of new columns generated by the 
procedure, and OPT TIME is the total time in CPU seconds required to 
achieve the optimal partition. 

TABLE 10. RESULTS FOR THE REFINEMENT PROCEDURE 





# REFINEMENTS 


# COLUMNS GENERATED 


OPT TIME 


BUS 


5 


123 


12.06 


AMERICAN 


2 


26 


94.60 



Comparing the results from Tables 9 and 10, it is obvious that the 
refinement procedure produces a solution much more quickly than the other 
methods. It is difficult to compare the solution values, because the 
true cost for each column generated by the procedure is not known. The 
costs assigned here to the new columns are representative, however, of 
those which would be assigned by the column generator. 
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VIII. CONCLUSIONS AND RECOMMENDATIONS 



It has been shown that practical, large-scale set covering and set 
partitioning problems can be solved optimally and efficiently. The Block 
Partitioning Algorithm is clearly the most robust and most successful 
technique examined in this study, and its efficiency compares favorably 
with published solution technologies for this problem class. 

Unfortunately, the implementation of the Generalized Network Reformula- 
tion for the SPP/SCP did not perform as well as expected. The continuous 
relaxation of the Integer Generalized Network is too weak to be of 
much practical use; therefore, this technique does not hold much promise 
for the rapid solution of set covering and set partitioning problems. 

Much work remains in improving the integer enumeration scheme subse- 
quent to the solution of the linear programming relaxation. The default 
elastic method works well, but additional research is needed to improve 
its performance. The coupling of the column generating program with the 
optimizer is a concept which holds great promise for the efficient 
solution of problems in this class. As illustrated by the Refinement 
Procedure results, spectacular reductions in solution time can result 
from implementing this idea. 

The proposed standard data input format displayed in Appendix 3 makes 
data manipulation both easy and convenient, and dramatically reduces 
storage requirements for any mathematical programming system capable of 
exploiting it. A tape containing all of the test problems in this format 
is available to those who wish to continue research in this area. 
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APPENDIX A. DESCRIPTION OF SELECTED APPLICATIONS 



A. POLITICAL DISTRICTING (SPP) 

Let the rows represent m basic population units (such as counties, 
census tracts, etc.)* Let the columns represent n possible districts or 
subsets of the population units such that each potential district meets 
the requirements on population size, contiguity, compactness, and so 
forth. A side cardinality condition (9) usually imposed is that there be 
exactly J districts. If C. is some ordinal measure of the unaccept- 

J 

ability of district j, then an optimal solution to the SPP yields an 
optimal districting plan. 

B. COLORING PROBLEMS (SPP) 

Consider the problem of coloring a map so that no two adjacent areas 
have the same color. Let there be m such areas. A column j is generated 
if no two elements of column j correspond to areas having a common 
boundary. If all costs are unity, an optimal partition indicates the 
minimum number of colors needed. A direct application of this concept is 
the problem of minimizing the number of distinct radio frequencies 
necessary to provide service in several geographical areas. A column j 
is generated if no two elements of column j correspond to areas with 
overlapping frequencies. 
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C. NUCLEAR AND CONVENTIONAL TARGETING 



1. Conventional Scenario (SCP) 

Let each row i represent a target which must be engaged at 
least times. Let each column j represent a weapons system capable 
of engaging some subset of the m targets within a specified time period. 

If the cost coefficients reflect the expected effectiveness of a given 

weapons system on the targets covered by column j, the optimal solution 
will yield the most effective subset of weapons systems capable of 
accomplishing the mission. If columns are generated so that k missions 
are possible for each of p weapons systems, then a constraint will be 
necessary to ensure that each weapon system is given only one mission in 
the optimal solution. The maximal SCP formulation can also be used here 

to find the combination of weapons systems which can engage some specified 

proportion of targets. 

2. Nuclear Scenario (SPP) 

Let each row represent a target which must be engaged only once 
in a given time period (to avoid fratricide, for instance). Let each 
column j represent a weapons system capable of engaging some subset of 
the m targets (i.e., various footprint alternatives). If a unit-cost 
objective function is used, the optimal solution will yield the minimum 
number of weapons systems needed to destroy all the targets. 

D. INFORMATION RETRIEVAL (SCP) 

Consider the problem of retrieving information from n files, where 
the file is of length C.. Suppose that m requests for information 

J 

are received. Each unit of information is stored in at least one file 

j indicated by a.. = 1. An optimal cover yields a subset of files that 
^ J 
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minimizes the maximum total length which needs to be searched in order to 
guarantee retrieval of all the information. 

E. CYCLIC SCHEDULING (SCP) 

A fundamental problem of cyclic staffing is to size and schedule a 
minimum cost workforce so that sufficient workers are on duty during each 
time period. The k,m cyclic scheduling problem models the task of 
finding the minimum cost assignment of workers to shifts so that each 
person works k time periods consecutively out of m, and at least fa- 
workers are present during the day i. A sample tableau for the 5,7 
cyclic scheduling problems is shown below. 



X x X 2 X 3 X 4 X 5 X 6 X y RHS 



MON 

TUE 

WED 

THU 

FRI 

SAT 

SUN 



10 0 11 
110 0 1 



1 1 
1 1 



1110 0 



1 



1 1 1 



0 0 1 



1 1 
0 1 



1110 0 
11110 



0 0 



1 1 1 



0 < X. < u.. and Integer for all j 

J _ J 



>_ b 

1 b 

>_ b 

1 b 
1 b 

1 b 

> b 



1 

2 

3 

4 

5 

6 
7 



The above formulation is not a binary program; therefore, to transform 
it into one, two alternative techniques can be used. If it is desired to 
distinguish between individual workers, the above seven columns can be 
replicated for each worker. An additional side constraint will be 
necessary to ensure that a worker is not selected to work more than one 
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shift. This approach could result in a large number of columns, so 
another a’ternative is to use a binary representation in place of each 
Xj . For any reasonable value of Uj, this is a feasible technique. 

For ’arge values of U., the requirement that the X. be integer is 

J J 

probably not worth the computational expense, and the problem should be 
solved as a continuous LP. Additional considerations such as overtime, 
days-off scheduling, part-time workers, over- and under-staffing, etc., 
are discussed in [Ref. 51]. 

F. SALES TERRITORY DESIGN (SPP) 

A problem facing sales managers is how to identify which customers 
should be included in a given sales territory, and how to determine the 
best call frequencies for individual customers, in short, how to allocate 
a given amount of the time of several salesmen to several hundred prospec 
tive customers so as to maximize sales. Let the rows be customers. Let 
the columns represent p sets of candidate territories, one for each of 
the p salesmen. Let the costs reflect the potential sales response 
evaluations for a particular salesman in territory j. A side constraint 
is necessary to ensure that only one territory is picked from each of the 
p sets. The requirement that a customer can appear in one and only one 
sales territory makes this the SPP. 

The generation of candidate territories is a difficult Drocess in 
itself. Shanker, et al . [Ref. 8] suggest a procedure which involves 
solving a series of integer programs. One ILP selects territories wh’ch 
maximize demand potential subject to a series of workload, stratification 
and compactness constraints. This set of territories is in turn evaluate 
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in another ILP which maximizes a piecewise linear response function 
subject to calling frequency constraints. Subjective considerations can 
be included at various points in the process to help further reduce the 
number of candidate territories which finally appear in the SPP. 
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APPENDIX 3. PROPOSED DATA INPUT FORMAT 



A very large part of the time invested in this research has been 
spent manipulating and entering problem data which came to the author in 
many different forms. The sheer size of the data sets made them extremely 
unwieldy; therefore, it was decided early on that a compact format for 
these problems could make data manipulation both easy and convenient, and 
would encourage other researchers to adopt this format as a standard. 

The format chosen has many advantages for large-scale problems. 

1. It is compact, listing only problem dimensions, constraint ranges, 
cost coefficients, and coefficient addresses. This not only 
’■educes Input/Output time, but makes it possible to handle quite 
large data sets under interactive, time-sharing systems such as ISM 
CMS. 

2. Storage requirements are easily calculated. Problem dimensions 
are known immediately after reading the first card image. This 
eliminates the need to make multiple passes of the data, or to guess 
at the problem size, as is the case with MPS format [Ref. 74]. 

3. Data Generation Programs are simplified. Row and column labels are 
accommodated, but they are not orimary keys, thus avoiding alpha- 
numeric manipulations with symbol tables. 

4. Column manipulation of data input is made easy since ail information 
for each column is contiguous. 

5. This column format is easily generated by commercially available V PS) 
problem generation systems. 

The data input format consists of three sets of card images: 

1. Problem Dimensions. Format (316) (One Card) 

a. M = Number of Rows 

b. N = Number of Columns 

c. NZEL = Number of non-zero Elements. 



66 



2. Constraint Ranges. Format (2A4, 2E16.8) (M Cards) 

a. IR = Row Index 

b. RL = Lower Range Limit 

c. RU = Upper Range Limit. 

3. Column Data. (N or More Cards) 

a. The number of cards needed depends upon the number of non-zero 
elements in each column (= NCE) . The format for the first 
column card is (2A4, F14.3, 1015). 

1. JC = Column Index 

2. C- = Column Cost Coefficient 

J 

3. NCE = Number of Non-Zero elements in the column 

4. IR = Row Addresses of Non-zero Coefficients. 

b. If NCE is greater than 9, additional column cards are needed 
to hold the row addresses for that column. The format for 
additional column cards is (20x, 1015). 
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TABLE 11. INPUT DATA FOR STEINER1 
AN EXAMPLE IN PROPOSED STANDARD FORMAT 



117 27 351 

1 0. 1OOO0OOOD+O 1 

2 0. 10000090D+C 1 

3 0. 10000000D+0 1 

4 0. 10000000D+G 1 

5 0. 10000000D+0 1 

6 0. 10000000D+C 1 

7 0. 1000000 02*0 1 

8 0. 100000090+0 1 

9 0. 10000009D+0 1 

1 0 0. 100030 OOD+O 1 

1 1 0. 10009000D+0 1 

12 0. 10000000D+G 1 

13 0. ICOOOOOOD+O 1 

14 0. 1 00 00 000 D + 0 1 

15 0. 10000000D + 9 1 

16 0. 10000000D+0 1 

17 0. 10000000D + 0 1 

18 0. 10000009D + 0 1 

19 0. 10000000D+0 1 

20 0. 100000300*0 1 

21 0. 10000003D + 0 1 

22 0 . 100000000 + 0 1 

23 0. 100000000 + 0 1 

24 0. 1 3000000D+G 1 

25 0. 100000000*0 1 

26 0. 1 OOOC OGOD+O 1 

27 0. 10000030D + 0 1 

28 0. ICOOOOOOD + O 1 

29 0.190000000+01 

30 0. lOOOOOCOD+O 1 

3 1 0. 100000000 + 0 1 

32 0. lOOOOOOOD+C 1 

33 0. 10000000D+0 1 

34 0. 10000000D+0 1 

35 0.100000000+01 

36 0. 19C00009D+O 1 

37 0. 10000000D+0 1 

38 0. 1Q003000D + 0 1 

39 0. 10000000D + 0 1 

40 0. 10000030D + C 1 

4 1 0. 100000000 + 0 1 

4 2 0. 190 00 030D + 0 1 

43 0. 1 00 00 OOOD + 0 1 

44 0. 1 OOCOOOQD + 9 1 

45 0. lOOOOOCOD+O 1 

46 0. 100C03COD+C 1 

47 0. 10000030D + C 1 

48 0. 1 OCOOOOOD+G 1 

4 9 0. 100 000090 + 0 1 

50 0. 100003000 + 0 1 

5 1 0. 10000900D + 0 1 

52 0. 10000000D+0 1 

53 0. 100000000 + 0 1 

54 0. 1 OOGOOOCD + C 1 

55 0. 10000000D + 0 1 

56 0. 10000000D+0 1 

57 0. 10000000D+0 1 

58 0. 100000000 + 0 1 

59 C. ICOOOOOOD + O 1 

60 0. lOOOCOOOD+O 1 



0. 1 0000000D+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 09000000+21 
0. 1 OOOOOOOD+2 1 
0. 1 OGOO9C0D+2 1 
0. 1 00030000+2 1 
0. 1 00000000+2 1 
0. 1 OOGOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 00000000+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
«). 1 OOOOOOOD + 2 1 
0. 1 00000000+2 1 
0. 1 OOCGOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD + 2 1 
0. 1 0000900D+2 1 
0. 1 00000000+21 
0. 1 GOOGGOOD+2 1 
0. 1 03C0000D+2 1 
0. 1 OOOOOGOD+2 1 
0. 1 OOOOC 00D + 2 1 
9. 1 OOOOOOOD + 2 1 
0. 1 OOOOOOOD + 2 1 
0. 1 0300300D+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOGOC OOD + 2 1 
0. 1 00000000 + 2 1 
0. 1 OOOOOOOD + 2 1 
0. 1 OOOOOOOD + 2 1 
0. 1 OOGGO 30D + 2 1 
0. 1 33000009 + 2 1 
0. 1 00 00 GOOD + 2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 00 OOC OOD + 2 1 
0. 1 OOOOOOOD+2 1 
9. 1 00000 000 + 2 1 
0. 1 C000GQ0D + 2 1 
0. 1 OCJOOOOD + 2 1 
0. 1 9CU0300D+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 3000000D + 2 1 
0. 1 OOOOOOCD + 2 1 
0. 1 0000 0 OOD + 2 1 
0. 1 0 0000 GOD + 2 1 
0. 1 OOOOC OOD + 2 1 
0. 1 OOOCCOOD+2 1 
0. 1 OCOOO OOD + 2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 00 JOuOOD+2 1 
0. 1 OOOOOCOD+2 1 
0. 1 OOOOOCOD+2 1 
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6 1 0. 10000000D+0 

62 0. 10000000D+0 

63 0,100000000+0 

64 0. 10000000D + 0 

65 0. 10000000D + 0 

66 0. 10000000D+0 

67 0, 100000000 + 0 

68 0 , 100000000+0 

69 0. 10000000D + 0 

70 0,100000000+0 

71 0. 100000000 + 0 

72 0. 100000000+0 

73 0. 10000000D + 0 

74 0, 1 00000000+0 

75 0. 100000000+0 

76 0. 10000000D+0 

77 0.100000000+0 

78 0. 10000000D+0 

79 0. 1COOOOOOD + C 

80 0. 10000000D + 0 
8 1 0 . 100000000+0 

82 0. 10000000D+0 

83 0. 10000000D + 0 

84 0. 1 00 00 GOOD +0 

85 0. 1 00 00 00 OD + 0 

86 0. 10000030D+0 

87 0. 1OOOO0OOD+O 

88 0. 1 OOOOOOOD+O 

89 0. 1 OOOOOOOD + O 

90 0. 10000000D+0 

9 1 0. 10000000D + 0 

92 0, 1 00 00 GOOD +0 

93 0. 10000000D + 0 

94 0. 1OOC0OOOD+O 

95 0. 1 00 00 0 OOD + 0 

96 0. 1 OOOOOOOD + O 

97 0. 1 00 00 0 OOD + 0 

98 0. 10000000D+0 

99 0. 10000000D + G 

100 0. 1 OC 00 00 OD + 0 

10 1 0. 100000000 + 0 

102 0. 1 00 00 00 0 D+0 

103 0. 10000000D + 0 

104 0. 10000000D+0 

105 0. 1000C000D+0 

106 0. 1 00 00 000 D+0 

107 0. 10000000D+0 

108 0. 100000000 + 0 

109 0. 100 00000D + 0 

110 0 . 100 00000 D +0 

111 0. 1 00 OOOOOD + O 

112 0. 1 OOOOOOOD+O 

113 0. 1 00 0000 OD + 0 

114 0. 1 00 OOOOOD + O 

1 15 0, 10000000D + 0 

116 0. 100 OOOOOD+O 

117 0. 100 OOOOOD + O 



0. 1 00000 QOD + 2 1 
G- 1 00000 00D + 2 1 
0. 1 00000000*2 1 
0. 1 00000 OOD + 2 1 
0. 1 00000 00D + 2 1 
0. 1 0000 0 00D + 2 1 
0. 1 00000 QOD + 2 1 
0, 1 00000 OOD + 2 1 
0. 1 OOOOOOOD+21 
0. 1 0000 0 00D + 2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OQOOOOOD+21 
0. 1 00C0Q00D+2 1 
0, 1 OOOOOOOD+2 1 
0, 1 00000000 + 21 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD + 2 1 
0. 1 OOOOOOOD+2 1 
0- 1 OOOOOOOD+21 
0. 1 OOOOOOOD+21 
0. 1 00000000+21 
0. 1 00000 OOD + 2 1 
0, 1 OOOOOOOD + 21 
0. 1 OOOOOOOD + 21 
0. 1 0000000D + 2 1 
0. 1 00000000 + 2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0, 1 0000000D+2 1 
0. 1 00000000 + 2 1 
0.1 0000000D+2 1 
0. 1 0 0000 00 D+ 2 1 
0. 1 OOOOOOOD + 21 
0. 1 OOOOOOOD+21 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD + 2 1 
0.1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+21 
0. 1 OOOOOOOD + 2 1 
0.1 00000000+21 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+21 
0. 1 OOOOOOOD+2 1 
0. 1 09000000+2 1 
0. 1 OOOOOOOD+2 1 
0. 1 OOOOOOOD+21 
). 1 OOOOOOOD + 2 1 
0. 1 OOOOOOOD+21 
0. 1 OOOOJOOD+2 1 
0. 1 OOOOOOOD+2 1 
3. 1 00000000+2 1 
0. 1 OOOOOOOD+2 1 
0, 1 OOOOOOOD + 2 1 
0. 1 0000000D+2 1 
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